Model editing
The goal of model editing is to enable the use of a single pair of input and desired output to alter a base model’s output for as well as its equivalence neighborhood (related input/output pairs), all while leaving model behavior on unrelated inputs unchanged. Mitchell2022fast
Does model editing ‘update’ the “knowledge” that the model has? When a human change their belief or knowledge, that can triggers a cascade of updates. Does this happen in LLMs?
Why doesn’t fine-tuning work? “Fine-tuning on a single example tends to overfit.” (Mitchell2022fast; see Zhu et al 2020 and De Cao 2021).
An early study: Sinitsin et al. 2020. not so efficient.
De Cao 2021: more efficient but fail in practice.
Mitchell2022fast proposes MEND (Model Editor Networks with Gradient Decomposition).